There Is No Preview Available For This Item
This item does not appear to have any files that can be experienced on Archive.org.
Please download files in this item to interact with them on your computer.
Show all files
Commodity datacenters of inexpensive machines are the computing platforms of choice for a wide range of applications, from online services and search engines to finance and e-science. Networks within datacenters are complex and often chaotic, with nodes sending and receiving data in many different channels. In addition, datacenters are linked with high-speed optical networks that shuttle data to remote mirrors for disaster tolerance, client locality and energy savings. As the networks within and between datacenters increase in capacity and complexity, the commodity 'blade-servers' inside are unable to keep up, either failing to fully utilize the links they are attached to or stalling under traffic spikes. In particular, the protocols running on these machines react to data loss in the network in fundamentally unstable and inefficient ways.
This talk presents two systems for reliable datacenter communication. Ricochet is a reliable multicast protocol for communication within a datacenter and Maelstrom is a transparent proxy for communication between datacenters. Both systems use Forward Error Correction (FEC) techniques in new ways that enable timely and scalable packet recovery, making key choices on where to generate redundant XORs and what to include in them. We show that proactive error correction can be a powerful reliability primitive for constructing fault-tolerant systems that recover rapidly and gracefully from failure.